Enron Corpus

The Enron Corpus is a large database of over 600,000 emails generated by 158 employees[1] of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company's collapse.[2] A copy of the database was subsequently purchased for $10,000 by Andrew McCallum, a computer scientist at the University of Massachusetts.[3] He released this copy to researchers, providing a trove of data that has been used for studies on social networking and computer analysis of language. The corpus is "unique" in that it is one of the only publicly available mass collections of "real" emails easily available for study, as such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access.[3]

References

  1. ^ Klimt, Bryant and Yiming Yang. "The Enron Corpus: A New Dataset for Email Classification Research"
  2. ^ "The Enron Email Corpus" Retrieved March 5, 2011.
  3. ^ a b Markoff, John. "Armies of Expensive Lawyers, Replaced by Cheaper Software". New York Times March 5, 2011. p A1.

External Links